A study on invariance of f-divergence and its application to speech recognition

نویسندگان

Yu Qiao

Nobuaki Minematsu

چکیده

Identifying features invariant to certain transformations is a fundamental problem in the fields of signal processing and pattern recognition. This paper explores a family of measures called f -divergences that are invariant to invertible transformations, and studies their application to speech recognition. We provide novel proofs for the sufficiency and necessity of the invariance of f -divergence. Several techniques to calculate or approximate f -divergences in general cases and for special distributions such as Gaussian and Gaussian mixture are reviewed. We show how to construct an invariant structural representation from sequence data through maximum likelihood decomposition, and prove the invariance of this decomposition. We demonstrate an application of this invariant representation to recognizing connected Japanese vowel utterances. In addition, we propose several techniques to improve the recognition performance. The experimental results show that the invariant structure achieves better performance than Hidden Markov Models, a widely used technique for acoustic modeling of speech sounds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Trans. Signal Processing

دوره 58 شماره

صفحات -

تاریخ انتشار 2010

A study on invariance of f-divergence and its application to speech recognition

نویسندگان

چکیده

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

A Comparative Study of Gender and Age Classification in Speech Signals

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

عنوان ژورنال:

اشتراک گذاری